Automatically Finding Significant Topical Terms from Documents

نویسندگان

  • Quanzhi Li
  • Yi-fang Brook Wu
  • Razvan Stefan Bot
  • Xin Chen
چکیده

With the pervasion of digital textual data, text mining is becoming more and more important to deriving competitive advantages. One factor for successful text mining applications is the ability of finding significant topical terms for discovering interesting patterns or relationships. Document keyphrases are phrases carrying the most important topical concepts for a given document. In many applications, keyphrases as textual elements are better suited for text mining and could provide more discriminating power than single words. This paper describes an automatic keyphrase identification program (KIP). KIP’s algorithm examines the composition of noun phrases and calculates their scores by looking up a domain-specific glossary database; the ones with higher scores are extracted as keyphrases. KIP’s learning function can enrich its glossary database by automatically adding new identified keyphrases. KIP’s personalization feature allows the user build a glossary database specifically suitable for the area of his/her interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Conceptual Terms from Medical Documents

Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which beco...

متن کامل

Associating Terms with Text Categories

Discriminating between text articles and automatically classifying documents is an essential task for many applications. With the prevalence of digital documents and the wide use of e-mail and web documents, text categorization is regaining interest and is becoming a central problem in digital text collections. There have been many approaches to solve this problem, mainly from the machine learn...

متن کامل

Topic Trend Detection in Text Collections using Latent Dirichlet Allocation

Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and the extensive growth of the number of documents in many domains. Traditionally, the task of topic discovery has been mainly addressed through algorithms that work on a snapshot view of the documents, which ignores the ...

متن کامل

A Semi-Supervised Incremental Algorithm to Automatically Refine Topical Queries

The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user’s information needs and the relevant documents’ vocabulary. The learning strategy uses an inc...

متن کامل

Document Re-ranking Based on Automatically Acquired Key Terms in Chinese Information Retrieval

For Information Retrieval, users are more concerned about the precision of top ranking documents in most practical situations. In this paper, we propose a method to improve the precision of top N ranking documents by reordering the retrieved documents from the initial retrieval. To reorder documents, we first automatically extract Global Key Terms from document set, then use extracted Global Ke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005